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MAXIMALLY PERSISTENT CYCLES IN RANDOM GEOMETRIC 

COMPLEXES 


OMER BOBROWSKI, MATTHEW KAHLE, AND PRIMOZ SKRABA 


Abstract. We initiate the study of persistent homology of random geometric 
simplicial complexes. Our main interest is in maximally persistent cycles of degree- 
k in persistent homology, for a either the Cech or the Vietoris-Rips filtration built 
on a uniform Poisson process of intensity n in the unit cube [0,1]*^. This is a 
natural way of measuring the largest “fc-dimensional hole” in a random point set. 
This problem is in the intersection of geometric probability and algebraic topology, 
and is naturally motivated by a probabilistic view of topological inference. 

We show that for all d > 2 and 1 < A: < d — 1 the maximally persistent cycle 
has (multiplicative) persistence of order 


0 
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with high probability, characterizing its rate of growth as n —>■ oo. The implied 
constants depend on k, d, and on whether we consider the Vietoris-Rips or Cech 
filtration. 


1. Introduction 

The study of topological properties of random graphs has a long history, dating 
back to classical results on the connectivity, cycles, and largest components in Erdds- 
Renyi graphs [201 El]- Generalizations have been developed in several directions. One 
direction is to consider different models of random graphs (see, e.g. [131 US])- 
other direction is to consider higher-dimensional topological properties, resulting in 
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the study of random simplicial complexes rather than random graphs, where in ad¬ 
dition to vertices and edges the structure consists also of triangles, tetrahedra and 
higher dimensional simplexes (see, e.g. llEHlIinillS). The study of random simpli¬ 
cial complexes focuses mainly on their homology, which is a natural generalization 
of the notions of connected components and cycles in graphs. Homology is an alge¬ 
braic topology framework that is used to study cycles in various dimensions, where 
(loosely speaking) a fc-dimensional cycle can be thought of as the boundary oia.k + 1 
dimensional solid (see Section]^ for more details). 

In random geometric simplicial complexes, the vertices are generated by a random 
point process (e.g. Poisson) in a metric space, and then geometric conditions are 
applied to determine which of the simplexes should be included in the complex. 
The two most studied models are the random Cech and Vietoris-Rips complexes 
(see Section 1^ for dehnitions). Several recent papers have studied various aspects 
of the topology of these complexes (see [a cni [El Eni in EB [52] and the survey 
0 )- These papers contain theorems which characterize the phase transitions where 
homology appears and disappears, estimates for the Betti numbers (the number of 
fc-dimensional cycles), limiting distributions, etc. While this line of research presents 
a deep and interesting theory, it is also motivated by data analysis applications. 

Topological data analysis (TDA) is a recently emerging held that focuses on ex¬ 
tracting topological features from sampled data, and uses them as an input for various 
data analytic and statistical algorithms. The main idea behind it is that topological 
properties could help us understand the structure underlying the data, and pro¬ 
vide us with a set of features that are robust to various types of deformations (cf. 
[niiiHiisii). Geometric complexes play a key role in computing topological features 
from a hnite set of data points. The construction of these complexes usually depends 
on one or more parameters (e.g. radius of balls drawn around the sample points), 
and the ability to properly extract topological features depends on choosing this pa¬ 
rameter correctly. One of the most powerful tools in TDA is a multi-scale version of 
homology, called persistent homology (see Section]^, which was developed mainly to 
solve this problem of sensitive parameter tuning. In persistent homology, instead of 
hnding the best parameter values, one considers the entire range of possible values. 
As the parameter values change, the observed topological features change (e.g. cycles 
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are created and filled in). Persistent homology tracks these changes and provides a 


way to measnre the significance of the featnres that show np in this process. One 
way to represent the information provided by persistent homology is via barcodes, 


see Fignrej^ Here, every bar corresponds to a featnre in the data and its endpoints 
correspond to the times (parameter valne) where the featnre was created and termi¬ 


nated. The nnderlying philosophy in TDA is that topological featnres that snrvive 
(or persist) throngh a long range of parameter valnes are significant and related to 
real topological strnctnres in the data (or the “topological signal”), whereas ones 
with a shorter lifespan are artifacts of the finite sampling, and correspond to noise 


(see |32]). This approach motivates the following qnestion: How long does a “long 


range” of parameters (or a long bar in the barcode) have to be in order to be con¬ 
sidered significant? Phrased differently - how long shonld we expect this range to 
be, if the sample points were entirely random, withont any nnderlying strnctnre or 
featnres? This is the main qnestion we try to answer in this paper. 

To be more specific, in this paper we stndy the case where the data points are 
generated by a homogeneons Poisson process in the nnit d-dimensional cnbe [0, 

(d > 1) with intensity n, denoted by Vn- We consider the persistent homology of 
both the Cech complex CiV^r) and the Rips complex IZiV^r), where the scale 
parameter r is the radins of the balls nsed to create these complexes (see Section]^. 
We denote by nfc(n) the maximal persistence of a cycle in the degree k persistent 
homology (1 < fc < d — 1) of either the Cech or the Rips complex. Note that nfc(n) 
is a property of the persistent homology, where we consider all possible radii, and 
therefore it does not depend r. Onr main result shows that, with high probability. 



in the sense that Ilk{n) can be bounded from above and below by a matching term up 


to a constant factor. The precise definitions and statements are presented in Section 


The proofs for the upper and lower bounds require very different techniques. To 


prove the upper bound we present a novel ‘isoperimetric-type’ statement (Lemma 


4.1) that links the persistence of a cycle to the number of vertices that are used to 




4 


OMER BOBROWSKI, MATTHEW KAHLE, AND PRIMOZ SKRABA 


form it. The lower bound proof uses an exhaustive search for a specihc construction 
that guarantees the creation of a persistent cycle. 

In addition to proving the theoretical result, in Sectionwe also present extensive 
numerical experiments conhrming the computed bounds and empirically computing 
the implied constants. These results also suggest a conjectural law of large numbers. 
Finally, we note that while the results in this paper are presented for the homogeneous 
Poisson process on a d-dimensional cube, they should hold with minor adjustments 
also to non-homogenous processes as well as for shapes other than the cube. We also 
predict that our statements will hold for more generic point processes (e.g. weakly 
sub-Poisson processes), using some of the statements made in [5T]. The detailed 
analysis of these more generic cases is left as future work. 

Earlier work: The study of the topology of random geometric complexes has 
been growing rapidly in the past decade. Most of the results so far are related 
to homology rather than persistent homology (i.e. hxing the parameter value). The 
study in [T^l EH] focuses mainly on the phase transitions for appearance and vanishing 
of homology, which can be viewed as higher dimensional generalizations of the phase 
transition for connectivity in random graphs. In [TJ [TUI SH E3 Hiore emphasis was 
given to the distribution of the Betti numbers, namely the number of cycles that 
appear. Similar questions for more general point processes have also been considered 
in lai. In [21 m] simplicial complexes generated by distributions with an unbounded 
support were studied from an extreme value theory perspective. The recent survey 
[9] overviews recent progress in this area. 

The study of random persistent homology, on the other hand, is at its very initial 
stages. Recall that the 0-th homology represent the connected components in a space. 
Thus, the results in [31 SSI about the connectivity threshold in random geometric 
graphs could be viewed as related to the 0-th persistence homology of either the 
Cech or the Rips complex. The hrst study of persistent homology in degree k > 1 
for a random setting was for n points chosen uniformly i.i.d. on a circle by Bubenik 
and Kim [15] . In this setting, they used the theory of order statistics to describe the 
limiting distribution of the persistence diagram. Another direction of study is the 
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persistence diagrams of random functions. In j8], the authors study the “persistent 
Euler characteristic” of Gaussian random helds. 

Another line of research (see e.g. dH 1201 EH [22112SI IMl E2] ) focuses on statistical 
inference using persistent homology, and include results about conhdence intervals, 
consistency and robustness for topological estimation, subsampling and bootstrap¬ 
ping methods, and more. 

Finally, we point out the earlier work in geometric probability [5], measuring the 
largest convex hole for a set of random points in a convex planar region R. A convex 
hole is generated when there is a subset of points for which the convex hull is empty 
(i.e. contains no other points from the set). The size of a convex hole is then measured 
combinatorially, as the number of vertices generating the hole. In [S] it is shown that 
the largest hole has 0 (log n/ log log n) vertices, regardless of the shape of the ambient 
convex region R. In this paper we are also measuring the size of the largest hole, but 
in a very different sense. We are using the algebraic-topological notion of holes (via 
persistent homology), rather than combinatorial notion of counting vertices, so as far 
as we can tell the fact that these two ways of measuring the size of the largest hole 
have the same right of growth (when d = 2 and k = 1) is something of a coincidence. 

As far as we know, this article presents the hrst detailed probabilistic analysis for 
persistent fcth homology of random geometric complexes, for k > 1. 

The structure of the paper is as follows. In Section we provide the topological 
and probabilistic building blocks we will use throughout the paper. In Section we 
present the main result - the asymptotic behavior of maximally persistent cycles. In 
Sections]^ and 1^ we provide the main parts of the proof for the random Cech complex 
(upper and lower bounds, respectively). Some parts of the proofs require more 
knowledge in algebraic topology than the others, and we present those in Section 
(including the proof for the Rips complex). Finally, in Sectionj^we present simulation 
results, complementing the main (asymptotic) result of the paper. 

2. Background 

In this section we provide a brief introduction to the topological and probabilistic 
notions used in this paper. 
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2.1. Homology. We wish to introduce the concept of homology here in an intuitive 
rather than in a rigorous way. For a comprehensive introduction to homology, see 
[5U] or |13]. Let X be a topological space, and F a held. The homology of X with 
coefficients m F is a set of vector spaces which are topological invariants 

of X (i.e. they are invariant under homeomorphisms). We note that the standard 
notation is Hk{X,¥) where F denotes the coefficient ring, but we suppress the held 
and let Hk{X) denote homology with F coefficients throughout this article. 

The dimension of the zeroth homology Hq{X) is equal to the number of connected 
components of X. For fc > 1, the basis elements of the fc-th homology Hk{X) 
correspond to fc-dimensional “holes” or (nontrivial-) “cycles” in X. An intuitive 
way to think about a fc-dimensional cycle is as the result of taking the boundary 
of a (A; -I- l)-dimensional body. For example, if X a circle then Hq{X) = F, and 
Hi{X) = F. If X is a 2-dimensional sphere then Hq{X) = F and H 2 {X) = F, while 
Hi{X) = {0} (since every loop on the sphere can be shrunk to a point). In general 
if X is a n-dimensional sphere, then 



F k = 0,n 
0 otherwise. 


We will use H^{X) when making a statement that applies to all the homology 
groups simultaneously. In addition to providing information about spaces, homology 
is also used to study mappings between spaces. If / : X —>■ F is a map between two 
topological spaces, then it induces a map in homology /* : iL*(X) —)■ if*(X). This 
map is a linear transformation between vector spaces which tells us how cycles in 
X map to cycles in Y. These mappings are important when discussing persistent 
homology. 

Finally, we say that two spaces X, Y are homotopy equivalent, denoted by X ~ X, 
if X can be continuously deformed to Y (loosely speaking). In particular, if X ~ X 
then H^{X) = H^{Y) (isomorphic). For example, a circle, an empty triangle and an 
annulus are all homotopy equivalent. 

2.2. The Cech and Vietoris-Rips complexes. As mentioned earlier, the Cech 
and the Rips complexes are often used to extract topological information from data. 
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These complexes are abstract simplicial complexes [36] and in our case will be gen¬ 
erated by a set of points in These complexes are tied together with the union of 
balls we define as 

(2.1) W(iP,r)= lji?,(p), 

pGV 

where V C and Bj.{p) is a d-dimensional ball of radius r around p. Note that 
the set V does not have to be discrete, in which case we can think of U{V,r) as a 
“tube” around V. The definitions of the complexes are as follows. 

Definition 2.1 (Cecil complex). Let V = {xi,X 2 , ■ ■ ■ ,Xn} be a collection of points 
in and let r > 0. The Cech complex C{V,r) is constructed as follows: 

(1) The 0-simplices (vertices) are the points in V. 

(2) A /c-simplex [xi ^,XjJ is in C{V, r) if Br{xi.) ^ 0. 

Definition 2.2 (Vietoris-Rips complex). Let V = {xi, X 2 ,..., be a collection 
of points in and let r > 0. The Vietoris-Rips complex 7l(V, r) is constructed as 
follows: 

(1) The 0-simplices (vertices) are the points in V. 

(2) A /c-simplex is in 7l{V,r) if Br{xi.) fl Br{xi^) ^ 0 for all 0 < 

j, I < k. 

Note that the Rips complex 1Z{V, r) is the flag (or clique) complex built on top 
of the geometric graph G{V, 2r), where two vertices x,, Xj are connected if and only 
if ||xj — Xj\\ < 2r. The difference between the Cech and the Rips complexes, is that 
for the Cech complex we require aX\ k + 1 balls to intersect in order to include a 
face, whereas for the Rips complex we only require pairwise intersections between 
the balls. Figure shows an example for the Cech and Rips complexes constructed 
from the same set of points and the same radius r, and highlights this difference. 

Part of the importance of the Cech complex stems from the following statement 
known as the “Nerve Lemma” (see SI)- We note that the original lemma is more 
general then stated here, but we will only be using it in the following special case. 
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Lemma 2.3. Let "P C 6e a finite set of points. Then C{V,r) is homotopy equiv¬ 
alent to U{V,r), and in particular 

HfiC{V,r)) = HfiU{V,r)). 

The Rips complex is commonly used in applications, as in some practical cases it 
requires less computational resources. In an arbitrary metric space, using the triangle 
inequality we have the following inclusions of complexes, 

(2.2) C(P, r) C P(P, r) C C(P, 2r). 

For subsets of Euclidean space, the constant 2 can be improved (see 1251 ). 




Figure 1. On the left - the Cech complex C{V,r), on the right - 
the Rips complex TZ{V, r) with the same set of vertices and the same 
radius. We see that the three left-most balls do not have a common 
intersection and therefore do not generate a 2-dimensional face in the 
Cech complex. However, since all the pairwise intersections occur, the 
Rips complex does include the corresponding face. 


2.3. Persistent homology. Let V C M'^, and consider the following indexed sets - 

W:=MP,r)}~,, C:={C(P,r)}-,, P := {P(P,r)}-„ . 

These three sets are examples of ‘hltrations’ - nested sequences of sets, in the sense 
that C J >2 if < ’"2 (where T is either W, C, or TVj. 

As the parameter r increases, the homology of the spaces may change. The 
persistent homology of P, denoted by PH*(P'), keeps track of this process. Briefly, 
PHfc(J-') contains information about the fc-th homology of the individual spaces 
as well as the mappings between the homology of and for every ri < r 2 
(induced by the inclusion map). The hirth time of an element (a cycle) in PH*(J^) 
can be thought of as the value of r where this element appears for the hrst time. 
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The death time is the value of r where an element vanishes, or merges with another 
existing element. 

Formally, we consider a filtration with parameter values from [0, oo), the birth and 
death times can be defined as: 

Definition 2.4. The birth of an element 7 G PHfe(J^) is 

'ybirth ■= min{r : 7 G Hk{Xr)} 

Definition 2.5. The death time of an element 7 G PHfc(J^) is 

'jdeath ■= min{r : 7 G -)■ Hk{Xr))} 

One useful way to describe persistent homology is via the notion of barcodes [M] . 
A barcode for the persistent homology of a filtration is a collection of graphs, one 
for each order of homology group. A bar in the fc-th graph, starting at b and ending 
at d {b < d) indicates the existence of an element of PHfc(J^) (or a fc-cycle) whose 
birth and death times are b and d respectively. In Figure]^ we present the barcode 
for the filtration U where P is a set of 50 random points lying inside an annulus. The 
intuition is that the longest bars in the barcode represent “true” features in the data 
(e.g. the connected component and the 1 -cycle in the annulus), whereas the short 
bars are regarded to as “noise.” It can be shown that the pairing between birth and 
death times is sufficient to yield a unique barcode [53] . 

2.4. The Poisson process. In this paper, the set of points we use to construct 
either a Cech or a Rips complex will be generated by a Poisson process Vn, which 
can be dehned as follows. Let Xi, X 2 ,... be an inhnite sequence of i.i.d. (independent 
and identically distributed) random variables in We will focus on the case where 
Xi is uniformly distributed on the unit cube = [0,1]*^. We note, however, that our 
results hold (with minor adjustments) for any distribution with a compact support 
and density bounded above and below. Next, £x u > 0, take N ~ Poisson(n), 
independent of the W’s, and define 

(2.3) P„ = {Xi,X2,...,XAr}. 


Two properties characterizing the Poisson process Vn are: 
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Figure 2. (a) = Ur is a. union of balls of radius r around V - a 

random set of u = 50 points, uniformly distributed on an annulus in 
We present five snapshots of this hltration. (b) The persistent 
homology of the hltration J^. The x-axis is the radius r, and the bars 
represent the cycles that born and die. For Hq we observe that at 
radius zero the number of components is exactly n and as the radius 
increases components merge (or die). The 1-cycles show up later in 
this process. There are two bars that are signihcantly longer than 
the others (one in Hq and one in Hi). These correspond to the true 
features of the annulus. 


(1) For every Borel-measurable set A C we have that 

\Vn n A| ~ Poisson(n Vo1(74 fl Q'^)), 

where I'l stands for the set cardinality, and Vol(-) is the Lebesgue measure. 

(2) A, B C are disjoint sets then \Vn'AA\ and \VnAB\ are independent 
random variables (this property is known as ‘spatial independence’). 

The Poisson process Vn is closely related to the hxed-size set {Xi,... ,X„}. Note 
that the expected number of points in Vn is E {N} = n. In fact, most results known 
for one of these processes apply to the other with very minor, or no, changes. This 
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is true for the results presented in this paper as well. However we choose to focus 
only on Vn, mainly due to its spatial independence property. 

In the following we study asymptotic phenomena, when n —>■ oo. In this context, 
if En is an event that depends on n, we say that En occurs with high probability 
(w.h.p.) if hm„^oo P (£^n) = 1 - 


3. Maximally persistent cycles 


For the remainder of this paper assume that d >2 and 1 < k < d — 1 are hxed. 
Let Vn be the Poisson process dehned above, and dehne 


U{n,r) ■=U{Vn,r), C{n,r) 


C(Vn,r), 7^(r^,r) 


n{Vn,r). 


Let PHfc(?T,) be the k-th persistent homology of either of the hltrations for V(,C, or TZ 
(it will be clear from the context which hltration we are looking at). Note that from 
the Nerve Lemma (2.3) we have that U{n,r) ~ C{n,r), so we will state the results 
only for C and TZ. However, some of the statements we make are easier to prove for 
the balls in U rather than the simplexes in C, and we shall do so. 

For every cycle 7 G PHfc(?T,) we denote by 'jurth,'!death the birth and death times 
(radii) of 7 , respectively. Commonly (see imiH]), the persistence of a cycle is 
measured by the length of the corresponding bar in the barcode, namely by the 
difference 5 ( 7 ) := 'ydeath — lurth- In this paper, however, we choose to dehne the 
persistence of 7 in a multiplicative way as 

'ydeath 


(3.1) 


ir( 7 ) : = 


'ytirth 


There are several reasons for dehning the persistence of a cycle this way. 


• This dehnition is equivalent to saying that we measure the difference in a 
logarithmic scale. Studying persistent homology in the logarithmic scale is 
common [211 [El EZl SZl EH]. 

• This dehnition is scale invariant, which is desirable, since ‘topological signif¬ 
icance’ should focus on shape rather than size. For example, consider the 
cycles corresponding to 71,72 in Figure]^ These two cycles are created by 
exactly the same conhguration of points, just at a diherent scale. There¬ 
fore, we would like to say that these cycles are equally signihcant. Clearly, 
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5(7i) > < 5 ( 72 ), while 71 ( 71 ) = 71 ( 72 ). Thus, our definition works better in this 
case. 

In addition, this scale invariance guarantees that a linear change in the 
units used to measure the data (e.g. from inches to cm, or from degrees 
Celsius to Fahrenheit) will not affect the persistence value. 

One purpose of using a persistence measure is to differentiate between cy¬ 
cles that capture phenomena underlying the data, and those who are created 
merely due to chance. To this end, the ‘physical size’ of the cycle is not nec¬ 
essarily the correct measure. Consider, for example, the cycles corresponding 
to 72 and 73 in Figure Intuitively, we would like to claim that 72 is more 
significant than 73 , as the former is created by a very ‘stable’ configuration of 
points, while the latter is created by outliers that clearly tell us nothing about 
the underlying structure. In this example, taking the ‘additive’ persistence 
we will have that ^( 72 ) < ^( 73 ), simply because the overall size of the annu¬ 
lus is much smaller than that of the triangle. However, taking multiplicative 
persistence yields 7 r( 72 ) > 7 r( 73 ), which is more consistent with our intuition. 
Both the Cech and Vietoris-Rips complexes are important in TDA, and the 
natural relationship between these complexes is a multiplicative one (see 
( 2 . 2 )). Because of this relationship, our results hold for both random Cech 
and Rips complexes, up to a constant factor (see Section 6.3). Indeed, the 
majority of approximation results for geometric complexes are multiplica¬ 
tive Ha Ha ETj, making multiplicative persistence more relevant to existing 
stability guarantees. 

The argument from Section 5 of this paper suggests that there are many 
cycles 7 for which 'ybirth = o{^death)- In this case, it is hard to differentiate 
between cycles by looking at --fdeath - lurth ~ '1 death- 


Our main interest is in the maximal persistence over all fc-cycles, defined as 
(3.2) nfc(u) := max 7 r( 7 ). 

7ePHi,(n) 

More specifically, we are interested in the asymptotic behavior of nfc(u) as n —)■ cx). 
The main result in this paper is that nfc(u) scales like the function Afc(n), defined 
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Figure 3. Multiplicative persistence as a significance measure. The 
dataset in this example consists of a few hundred points sampled from 
two annuli, and two outliers (on the right). We are interested in the 
1-cycles that denoted by 71 , 72 , 73 , that correspond to the two annuli 
and the triangle on the right. 


by 

(3.3) 


Afc(n) 


/ log n \ 
\ log log n J 


In particular we have the following theorem. 


Theorem 3.1. For fixed d > 2, and 1 < k < d — 1, let Vn be a Poisson process on 
the unit cube [0,1]'^ defined in (2.3), and let PHfc(n) be the k-th persistent homology 
of either C, or TZ. Then there exist positive constants A^, such that 

Akin) 


lim P 

n^oo 


Ak < 


Afc(n) 


<Bk] =1. 


Remarks: 


(1) The constants Ak and Bk depend on k (the homology degree), d (the ambient 
dimension), and on whether we consider the Cech or the Rips complex. We 
conjecture that a law of large numbers holds, namely that 11^ (n)/ Akin) —)• Ck 
for some Ak < Ck A Bk- For some evidence for this conjecture, see the 
experimental results in Section In the following sections we will prove 
Theorem 13.11 
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(2) The additive persistence ^( 7 ) can be bonnded naively by the result on the 

contractibility of the Cech complex in [39]. More concretely, Theorem 6.1 
states that if r > c then the Cech complex is contractible (w.h.p.). 

This implies that for every cycle 7 we have <5(7) < 7death < c ■ Similar 

statements can be made about PHq using the connectivity radius in jS] |16] 
(which is of the same (logn/n)^/'^ scale). However, these are only crude 
upper bounds on the additive persistence, that do not differentiate between 
the different cycles in persistent homology, or even between different degrees 
of homology (note that these bounds do not depend on k). 

(3) The study in (39] suggests the following upper bound for U^^n). As men¬ 
tioned before, we know that 7death < for all 7. In addition, the 

analysis in [39] shows that if 0 then Hk{C{n,r)) = 0, which 

_ fc + 2 

implies that ysirth ^ c'n '^(*+ 1 ) for some c' > 0. Therefore, we have that 
71(7) = O ^(log?7,)^/'^?7,A*^y However, as we shall see later, this is a very 
crude upper bound. 


4. Proof - Upper Bound 


For this section and the next one, consider the Cech complex only. We want to 


prove the upper bound in Theorem 3.1 That is, we need to show that there exists 
a constant Bk > 0 depending only on k and d, so that with high probability 

l/k 

/ H IW / /. \ 

Ukin) < BkAkin) = Bk 


/ logn 
\log logn 


The main idea in proving the upper bound in Theorem 3.1 is to show that large 
cycles require the formation of a large connected component in C{n, r) at a very early 
stage (small radius r). To this end we will provide two bounds: (1) a lower bound 
for the size of the connected component supporting a large cycle (Lemma 4.1), and 
(2) an upper bound for the size of connected components in C{n,r) for small values 
of r (Lemma 4.2). 


Lemma 4.1. Let 7 G PHfc(n), with 'jbirth = o,nd 77 ( 7 ) = p. Then there exists a 
constant Ci such that C{n, r) contains a connected component with at least m = Cip^ 
vertices. The constant Ci depends on k, d only. 
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The proof for this lemma requires more working knowledge in algebraic topology 
than the rest of this paper, and we defer it to SectionAt this point, we would like to 
suggest an intuitive explanation. Suppose that C(n, r) contains a fc-cycle such that all 
the points generating it he on a fc-dimensional sphere of radius i?, and such that there 
are no points of Vn inside the sphere. In that case the death time of the cycle will 
be R and then 71 ( 7 ) = p > R/r. The minimum number of balls of radius r required 
to cover a fc-dimensional sphere of radius R is known as the “covering number” and 
is proportional to {R/r)^ = p^. The cycle created is then a part of a connected 
component of C(n, r) containing at least C x p^ vertices. Intuitively, creating a cycle 
with the same birth and death times in any other way (i.e. not necessarily around 
a sphere) will require coverage of an area larger than the fc-dimensional sphere, and 
therefore larger connected components. To make this statement precise, in Section 
[^we present an isoperimetric-type inequality for fc-cycles. Note that this statement 
is completely deterministic (i.e. non-random). 

The following lemma bounds the number of vertices in a connected component of 
the Cecil complex C{n,r), for small values of r. 


Lemma 4.2. Let a > 0 be fixed, 
a and d such that if 


There exists a constant C 2 > 0 depending only on 


nr’^ < 


C2 

(logn)" 


and 


m> a 


logn 


log log n ’ 

then with high probability C{n,r) has no connected components with more than m 
vertices. 


Proof of Lemma M Let Nm{r) be the number of subsets of Vn with m vertices, 
that are connected in C{n, r). We can write N^ir) as 

1 {C{y, r) is connected} , 

y<lTn 

where the sum is over all sets 3^ of m vertices. We will show that choosing r and m 
as the lemma states, we have P (Am(r) > 0 ) —)■ 0 which implies the statement of the 
lemma. 
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By Palm theory (see for example, Theorem 1.6 of |15]) we have that 

n"* 

= —j-P(C({Xi,... is connected), 

ml 

where Xi ~ f/([0,l]'^) are i.i.d. variables. If C{{Xi,..., Xm},r) is connected, then 
the underlying graph must contain a subgraph isomorphic to a tree on m vertices. 
Suppose that P is a labelled tree on the vertices {1,... ,m}. Assuming that vertex 
1 is the root, for 2 < z < m let par(z) be the parent of vertex i in the tree. Suppose 
also that the vertices are ordered so that par(z) < i. If C{{Xi,... ,Xm},r) contains 
P then every X^ must be connected to A'par(j) which implies that X* G i? 2 r(-Apar(i))- 
Therefore, 

P {C{{Xi,.. .,Xm},r) contains P) < P [Xi e 52r(Xpar(j)), V2 < z < m) 

< / / ■ ■ ■ / dxm ■■■dxi 

dB2r{Xp^^(2)) >^B2r(a:par(m)) 

= (a;d2V)’"-b 

The second inequality is due to the effect of the boundary of cube. The same bound 
holds for any ordering of the vertices. It is known that the total number of labelled 
trees on m vertices is and therefore we have 

ml 

From Stirling’s approximation we have that ml > (m/e)™', and therefore, 

E{X^(r)} < n™e™m-2(a;rf2V)(™-^) = e^{euJd2V)^-\ 

Dehning C 2 = ^{eLJd2'^)~^, if zzr'^ < C' 2 (logzz)“" then 

E{X^(r)} < 

If m > a~^ tog log n therefore have (for n large enough): 

E{iVrn(r)}<4’ 

and ejm? —)■ 0 as n —)■ 00 . 

Finally, by Markov’s inequality, P(Xm(z’) > 0) < E{Am(z’)}, and therefore we 
have that P {Nm{r) > 0) —?• 0 which completes the proof. □ 
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With these two lemmas, we can prove the upper bound in Theorem 3.1 


Proof of Theorem \3.1\ - upper bound. Fix a value a > 0, and consider two kinds of 
fc-cycles: The early-horn cycles, are the ones created at a radius r satisfying nr"^ < 
C' 2 (logn)““ (see Lemma 4.2). The late-born cycles are all the rest. 

If 7 G PHfc(n) is an early-born cycle, then according to Lemma 4.2 it is part of a 
connected component with m < vertices. If 71 ( 7 ) = p, then from Lemma 


4.1 we have that Cip^ < m. Combining these two statements we have that with high 


probability. 




l/k 


Therefore 71(7) < BkAk{n), with Bk = 

Suppose now that 7 G Pllfc(n) is a late-born cycle. This implies that 'ftirth = r 
where > (logn)“", or in other words that 'jbirth > ( n(iogn)° ^^xt, in |39] 
it is shown (see Theorem 6.1) that there exists C > 0 such that if r > C 
then with high probability C{n,r) is contractible (i.e. can be “shrunk” to a point, 
and therefore has no nontrivial cycles). In particular, this implies that 'ydeath < 
C (^^ 7 ^) for every cycle 7 . Thus, for late-born cycles 7 


77(7) < C'(logn)(^+")/'='. 


Thus, for any a < d/k — 1, we have that with high probability the persistence of 
late-born cycles 7 satishes 


77(7) = o 


f logn \ 
\log logn J 



□ 


5. Proof - Lower Bound 


In this section we prove the lower bound part of Theorem 3T for the Cech complex 
C(n, r), namely that there exists Ak > 0 (depending on k and d), such that with high 
probability. 


Ilfc(n) > Ak/Akin) 


Ak 


/ log n \ 
\ log log n J 













18 


OMER BOBROWSKI, MATTHEW KAHLE, AND PRIMOZ SKRABA 


In other words, we need to show that with a high probability there exists 7 G PHfc(n) 
with 7 r( 7 ) > AfcAfc(n). 

To show that, we take the nnit cnbe Q = [0,1]'^ and divide it into small cnbes 
of side 2L. The nnmber of small cubes we can £t in Q denoted by M satishes 
M > C 2 ,L~'^ for some C 3 > 0. Denoting the small cubes by Qi,..., Qm-, we want to 
show that at least one of these cubes contains a large cycle. Let Qi be one of these 
cubes, and think of it as centered at the origin, so that Qi = [—L, Let £ < L/A, 
denote L = \_L/£\ x £, and dehne 

= [-L/2,L/2f+^ X [-£/2,£/2Y-^-^ 

SP = [-L/2 + i, L/2 - £f+^ X [-£/2, 

Si = 

In other words. Si is a “thickened” version of the boundary of a A; + 1 dimensional 
cube of side L Ri L (see Figure]^. 

We will show that if the balls of radius r around Vn cover Si but leave most of Qi 
empty then C{n, r) would have a /c-dimensional cycle. Choosing L and £ properly we 
can make sure that this cycle has the desirable persistence. More specihcally, take Si 
and split it into m cubes of side £, denoted by Si^i, Si^ 2 , • • ■, Si^m (see Figure]^. The 
number of boxes m is almost proportional to the ratio of the volumes of Si and the 
S'jj-s, and therefore m < C 4 ^{L/£Y for some 6*4 > 0. The following lemma uses the 
process Vn but is in fact non-random, and provides a lower-bound to the persistence 
of the cycles we are looking for. 

Lemma 5.1. Suppose that for every 1 < j < m we have \Sijr\Vn\ = 1, and 
\Qi n Vn\ = m. Then there exists 7 G Pllfc(n) with nQi) > x j. 

The proof of this lemma also requires some working knowledge in algebraic topol¬ 
ogy, and therefore we postpone it to Section Intuitively, the assumptions of the 
lemma guarantee that for every r G [ri,r 2 ], where ri = ■\/~d£ and r 2 = L/4, the 
union of balls U{Vn H Qi, r) covers Si, and is disconnected from the rest of the balls. 
Therefore, its shape is “similar” to Si and forms a nontrivial fc-cycle. Since this cycle 
exists through the entire range [ri, r 2 ] its persistence is greater than r 2 /ri = L/A\/~d£. 
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Figure 4. The construction we are examining to find a maximal cy¬ 
cle, for d = 3 and k = 1. Qi is the big box of side 2L, and Si is 
construction made of small boxes in the middle of it, which is homo- 
topy equivalent to a circle. 


Following Lemma [5.11 , we define the event 

Ei = {il<j <m \ \Sij nVn\ = l, and \Qi nVn\ = m} , 

then E = i?i U i ?2 U ■ ■ ■ U Em is the event that at least one of the Qi cubes contains a 
large cycle. Lemma [5T] suggests that to prove there exists a large cycle it is enough 
to show that E occurs with high probability. We start by bounding the probability 
of the complement event. The next lemma shows that given the right choice of 
L = L{n) and t = t{n) we can guarantee that E = E^^'^ satisifes P (E) —1. 


Lemma 5.2. Let ni'^ = (logn)“" such that a > d/k, and let L = AkAk{n)i where 
Ak < Then 

lim P (E) = 1. 

n^oo 

Proof. We start with the probability of Ei. By the spatial independence property of 
the Poisson process we have 

P(Ei) = 
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and therefore, 


M 


P(E'=) = JJ(l - P(Ei)) = (1 - 


'd\m p — n{2L^ 


i=l 


Thus, in order to prove that P {E) —)■ 1 it is enough to show that 

S := oo. 

Recall that M > C'iL~^ and that m < Ci{L/i)'^. Assuming that ni^ < 1 we have. 

Now, if ni'^ = (logn)“" < 1 for some a > 0 and L = AkAk{n)i for some > 0, 
then 

Taking a > d/k yields that riL'^ —)■ 0, and therefore 

{lognyA-a 

for some constant C. Choosing A^ such that C^A^a < 1 we have £ ^ oo which 
completes the proof. □ 


Proof of Theorem 3.1 - Lower bound. From Lemma 5.2 we have that if nf"* = (logn 


and Lfl = AkAk{n) then with high probability E occurs. From Lemma 5.1 this im¬ 
plies that with high probability we have a “cubical” cycle 7 with 7 r( 7 ) > AkAk{n)/A\/~d. 
Taking A^ = A^/A^fd completes the proof. □ 


6. Proofs for Topological Lemmas 


As mentioned above, the proofs for Lemmas 4T and 5T require some working 
knowledge in algebraic topology. In particular, we will be making use of the dehni- 
tions of chains, cycles, boundaries and induced maps in both simplicial and singular 
homology. For more background, see [36] or |13]. To make reading the paper fluent 
for readers who are less familiar with the subject, we deferred these proofs to this 
section. Also included in this section is the translation of Theorem 3.ll from the Cech 
to the Rips complex. 
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6.1. Proof of Lemma |4.1| , First, we restate the lemma. 

Lemma 4.1. Let 7 G PHfc(n), with 'ftirth = r and 71 ( 7 ) = p. Then there exists a 
constant Ci such that C{n, r) contains a connected component with at least m = Cip^ 
vertices. The constant Ci depends on k, d only. 


For the sake of simplicity, we will be using homology with coefficients in F = Z/2Z. 
Nevertheless, Lemma |4.1 holds using coefficients over any held. 

For every two spaces Si C S 2 we denote i : Si ^ S 2 a.s the inclusion map, and 
the induced map in homology will be L : H^:{Si) —)■ iL*(S' 2 ). For any hnite set V C 
[0,1]'^ and every r > 0, by the Nerve Lemma 2.3 the spaces C{V,r) and U{V,r) are 
homotopy equivalent. Therefore, there are natural maps h : U{V,r) —)■ C{V,r) and 
j : C{V,r) U{V,r) such that the induced maps h* : H^{U{V,r)) — H^{C{V,r)) 
and jh : H^{C(V,r)) —)■ H^(U(V,r)) are isomorphisms. 

The explicit construction of j is as follows. Each vertex in C{V,r) is sent to the 
center of the corresponding ball. The map is then extended to every simplex by 
mapping it to the convex hull of the points its vertices are mapped to. Each simplex 
is a convex set and it is straightforward to check that in Euclidean space, the image 
of each simplex lies within the union of balls U{V, r). This way for every fc-simplex 
a G C{V,r) we can dehne its volume Volfc(cr) to be the fc-dimensional Lebesgue 
measure of j{a) C 

With the volume of a simplex dehned, we can now dehne the volume of a chain. If 
7 G Ck{C(V,r)) is a fc-chain of the form 7 = ^ {0; I}); then Volfc( 7 ) ; = 

Volfc((jj). In other words, the volume of a chain is dehned to be the sum of the 
volumes of the simplexes it contains. 

To prove Lemma 4T we will be using an isoperimetric inequality related to singular 
cycles in U{V,r) (see Theorem 6.2), rather than work directly with the simplicial 
cycles. To try to avoid confusion we will use 7 to refer to simplicial cycles, and 
p for singular cycles. Recall that a singular fc-simplex in is a actually map 
a : —)■ where is the standard fc-simplex. For brevity, we will identify every 
singular simplex a with its image Im((T) C M'^, and every fc-chain p = ^. ajCij with 
the union Im(crj) C M'^. We will also need to dehne the volume of a singular 

/c-chain. Such a dehnition exists (cf. | 331 ), however we will be looking only at chains 
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that are of the form r] = ^( 7 ) where 7 is a simplicial fc-chain in C(P, r), and for those 
we can simply dehne Yolkij]) := Volfc( 7 ). 

Next, we dehne the filling radius of a singular fc-cycle. Intuitively, the hlling radius 
of a cycle measures how much we need to “inhate” the cycle to get it hlled in (so it 
becomes trivial). Formally, 


Definition 6.1. Let 7 be a compactly supported singular cycle in lA{V,r). A filling 
of 7 is a (/c + l)-chain in such that ciF = rj. The filling radius Rfmiji) is dehned 
as 

Rfiiiiv) = inf {p > 0 : SF such that rj = dT and F C U{r],p)} . 

In other words, Rfufirj) is the smallest p such that the “p-thickening” of p contains 
some hlling F. 


The workhorse of our proof of Lemma 4.1 is the following general isoperimetric 
inequality due to Federer and Fleming 


For a proof, see either the original article 
or Section 3 of Guth’s expository notes on Gromov’s systolic inequality 


Theorem 6.2 (Volume to filling radius, isoperimetric inequality). Let rj he 

a singular k-cycle, such that Voh(p) = V. Then the filling radius of rj satisfies 

Rfuiiv) < CV^'\ 


for some constant C (depending on k,d). 


Recall that as in Dehnition 6.1, p is a /c-cycle in UifP^r). However, it is worth 
noting that for any fc-cycle 7 G C{V,r), there is a canonical inclusion into U{V,r). 
This is the geometric realization of p (although it need not be embedded). Hence, 
this result also holds for cycles in the Cech complex. 

To prove Lemma 4.1 we will thus need to take two steps - (1) bound the volume 
of a cycle p, and (2) bound death time of p using the hlling radius Rfuiijj). We start 
with the following dehnition. 


Definition 6.3. Let X be a set in M'’*. For £ > 0 the set S is called an £-net of X if: 

( 1 ) 

(2) X cU{S,e), i.e. X is covered by the balls of radius £ around S, and 

(3) For every pi,p 2 e S, \\pi -P 2 II > 
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In other words, an £-net is both an e-cover and an e-packing. 

e-nets are a standard construction in computational geometry and exist for any 
metric space IS], They can be constructed incrementally using the following algo¬ 
rithm; (1) Initialize S to be the empty set. (2) Select any uncovered point in X and 
add it to S (3) Mark all points of distance less than e from the selected point as 
covered. (4) Repeat 2-3 until there are no uncovered points. T 

Next, let V = {xi,X 2 , ■ ■ ■ ,Xm} C and let S' C "P be an e-net of V. By the 
dehnition of e-nets, the following holds: 

(6.1) VcU{S,e) 


(6.2) 


\\Pi-Pj\\>^ ^PuPj ^ S 


Using (6.1) and the triangle inequality, we also have 
(6.3) U{V, e) C W(S, 2e) C W(P, 2e). 


We will use the intermediate construction U{S,2e) to bound the volume of cycles. 
In particular, we will need the following lemma. We use [■] to denote the equivalence 
class in homology of a corresponding cycle. 


Lemma 6.4. Let V and S be as defined above, and let ^ be a k-cycle in C{S,2e). 
Then Volfc( 7 ) < C^me^, where depends only on k,d. Consequently, for every 
(singular) cycle rj in U{S,2e) there exists a homologous cycle rj' such that [rj] = [rj'] 
and such that Volkir]') < C^me^. 


Proof. The fc-dimensional volume of 7 is the sum of the fc-volumes of the simplexes 
in 7 . This can be bounded by the maximal volume induced by any one simplex 
multiplied by the number of simplexes in 7 . 

To bound the number of simplexes, hrst observe that 7 is supported on S. By 
( 6 . 2 ) every pair of vertices pi,p 2 G S are at distance ||pi — P 2 II > £■ So the balls 
centered at points in S of radius e/2 are disjoint. This implies, by a sphere packing 
bound, that every vertex in S has only a bounded number of neighboring vertices 
in C{S,2e), namely the maximum number of disjoint balls of radius e/2 that can £t 
in a ball of radius 4e. This sphere packing number is clearly bounded above by 8^, 
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the ratio of the volumes of these spheres. This implies that every vertex is contained 
in at most fc-dimensional faces and since by assumption there are at most m 
vertices in V and hence S, there are at most k-dimensional faces total. 

To bound the maximal volume of the single simplexes, observe that the longest 
edge in any simplex of 7 has length at most 4 e. Therefore, for every simplex a in 7 
we have Voh((T) < ( 4 e)^ (the volume of a cube of side As). 

To conclude, we have shown that 7 has at most simplexes, the volume of 

each of them is bounded by Therefore, Voh(7) < C^me^ where 

Next, let 7 be a cycle in W( 5 , 2e). Since the map j* : 2e)) —)■ 2e)) 

is an isomorphism, we can look at the homology class and take a represen¬ 
tative cycle 7. Dehning rj' = j{pf) then [7'] = j* o = [7], so rj and 7' are 

homologous. In addition, since 7 is a cycle in C{S,2e) and rj' = j{^)i we have that 
Yo\k{rj') = Volfc(7) < C^me^. That completes the proof. 

□ 

For the next lemma, consider the following sequence of maps in homology (induced 
by inclusion maps), 

Hk{U{V,e)) > Hk{U{S,2e)) Hk{U{V,2e)) 


Lemma 6.5 (Vertices to volume). LetV = {xi,X 2 , ■ ■ ■ ,Xm} C Suppose thatr] 
is an arbitrary k-cycle inU(V, e), and let ioi{rj) he its image inUiV, 2 e). Then there 
exists a k-cycle rj' in U(V, 2 e), homologous to i o i{rj), such that Voh(7') < C^me^ 
for some constant Cs > 0 depending only on k and d. 


Proof. Let i{ri) be the inclusion of 7 into lA{S,2e). From Lemma 6.4 we have that 
there exists a cycle 7" in U{S,2e) such that [i]"] = [i{ri)] and such that Voh(7") < 
C^me^. Dehning p' = i{rj'') then [p'] = i*{\p'']) = = [i o *( 7 )], and since the 

inclusion does not change the volume we have Vo1a;(7') = Volk{p") < C^me^ . That 
completes the proof. 

□ 


Finally, we relate the hlling radius to the persistence of the cycles. 
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Lemma 6.6 (Filling radius to persistence). If rj is a cycle in U{V,r), with a 
filling radius Rfufir]) = R, then rjdeath < R + r. 

Proof. Since is a cycle in U{V,r), then by the triangle inequality we have that 
U{ri,R) C U(V,r + R). By the dehnition of R/m (see Dehnition 6 . 3 ), this implies 


that there exists a (/c + 1 ) cycle F in U{V, R + r) such that rj = dT. Therefore, in 
U{V, R + r) the cycle rj is already trivial which implies that rjdeath < R + r. □ 


We are now ready to prove Lemma 4.1 


Proof of Lemma Let 7 G PHfc(n) with '-yurth = l. Suppose that the simplexes 
constructing 7 are contained in a connected component with m vertices of C{n,r) = 
C{Vn,r). Let V <Z Vn he the set of vertices in this connected component, then 
necessarily 7 is also a cycle in C{V,r). 


Next, take the corresponding cycle rj = jiyy) in r). According to Lemma 6.5 


there exists a cycle 7' in W(P, 2r), homologous to ioil^rj), such that Voh(?7') < C^mr'^, 

this implies that Rfiii{r]') < C{C^mr^)^/^ = C'mfl^r. Using 


and from Theorem 
Lemma 


6.2 


6.6 


we then have that + 2). Since rf and i o iiyr]) are 

homologous, then rj and rj' share the same death time, which in turn implies that 7 
and T]' share the same death time. Therefore, 71(7) < C"m^/^ + 2 < . In other 

words, if 71(7) = p then we have that yfi < m{C")^. Taking Ci = 1 /{C")^ completes 
the proof. □ 


6.2. Proof of Lemma 15.11 We hrst restate the lemma. 

Lemma 5.1. Suppose that for every 1 < j < m we have \SijnVn\ = 1, and 
\Qi n Vn\ = 'rn. Then there exists 7 G PHfc(n) with 71(7) > x j. 

Proof. Let ri = \/~dI and r2 = L/A. The assumptions that IS'ij fFP^I = 1 for every 
1 < 7 < m and \Qi nVnl = assure that: 

• For every r > ri the set UifPn H Qi, r) is connected and covers Si ; 

• For every r <r 2 the sets U{Vn H Qj, r) and U{fPn\Qii t) are disjoint. 

In other words for every r G [ri,r 2 ] the set U(Vn H Qi,r) is a connected component 
of U{n,r). We will show that this component contains the desired cycle. 
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Defining =ll{Si,r), for every r G [ri,r2] we have 

In addition, for every r G [T’i,r2], the inclusion Si ^ is a homotopy equivalence 
and both spaces are homotopy equivalent to a fc-dimensional sphere, and in particular 
have a nontrivial fc-cycle. A standard argument in algebraic topology (using the 
induced maps in homology) yields that U(Pn H Qi, must have a nontrivial fc-cycle 
as well. Intuitively, since the fc-cycle in Si “survives” the inclusion into Sj^\ it must 
also be present in the intermediate set U{Vn H Qi,r). Now consider the following 
sequence induced by the inclusion maps. 

Hk{S,) ^ Hk{U{VnnQ,,n)) ^ Hk{U{VnnQi,r 2 )) ^ 

Let p be a nontrivial cycle in Si, then A o A o **([h]) 7^ 0 since by assumption 
A o A o A(?7) is a nontrivial cycle in Sj^^^ as well. Consequently, we must have 
**(W) 7^ 0 A o A(M) 7^ 0. Next, dehne ■y = ho i{rj) - a cycle in C('P„,ri), then 
7 is nontrivial and so does iiyy) in C('P„,r2). Therefore, 'jbirth < and ■ydeath > ^2, 
and then 

/ \ '^death ^ ^2 1 L 

7r(7) = - > — = -77 X -, 

Ibirth dvd f- 

this completes the proof. □ 


6.3. Proof of Theorem 3.1 for the Vietoris-Rips Filtration. 


Proof. Let r2 > 2ri, and consider the following sequences of maps induced by the 
inclusions in ( |2.2 ). 

Hk{C{n,n)) ^ HkiTZiPn)) ^ Rfc(7^(n,r2/2)) ^ Hk{C{n,r 2 )) 

Suppose there exists a cycle 7 in C(n, ri) with 'jdeath > ''"2- Then necessarily A o 
A o **([7]) 7^ 0, which implies that both **([7]) 7^ 0 and A o ^([7]) 7^ 0. Therefore, 
there exists a nontrivial cycle 7' = *(7) in lZ{n,ri) such that 'Jdeath — ''"2/2, and 
consequently vr(7') > r2/2ri. Thus, 

P (n^(n) > AkAk{n)) < P (n^(n) > AkAk{n)/2). 


(6.4) 
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On the other hand, we can look at the following sequence for r 2 > 2ri, 
Hk{n{n,n)) ^ Hk{C{n,2n)) ^ Hk{C{n,r2)) ^ ra)). 


Suppose that there exists a cycle 7 in the Rips hltration with 'fbirth < fi and 'fdeath > 
r 2 . Then there exists a cycle 7 ' in the Cech hltration with 7 ^^^^;^ < 2ri and '^'death — ''" 2 , 
and therefore, 77 ( 7 ') > r2/2ri. Thus, 

(6.5) P (nfc(n) < RfcAfc(n)) < P < 2RfcAfe(n)) . 


To conclude we have that 

nf(n) 


P Mfc < 


Afc(n) 



< P 


Ak/2 < 


Afjn) 

Afc(n) 


< 2B 



Since the left hand side converges to 1 so does the right hand side, which completes 
the proof. □ 


7. Numerical Experiments 

In this section, we present numerical simulations demonstrating the behavior of 
nA;(?T,) for the Cech complex in dimensions d = 2,3 and 4. The experiments were 
run by generating a Poisson process with rate n in the unit cube of the appropriate 
dimension. To generate randomness we used the standard implementation of the 
Mersenne Twister |1]. The persistence diagram computation was done using the 
PHAT library [6]. 

For each sample, the Cech complex is built until the point of coverage (or very near 
coverage), since past coverage the complex is contractible and there are no changes 
in homology. In dimension 2 and 3 , instead of the Cech hlrtration, we use the a- 
shape hltration [2H] which is based on the Delaunay triangulation. To compute the 
triangulations, we used the CGAL library |50]. The key beneht of this construction 
is that the simplicial complex is of a smaller size, e.g. in 2 dimensions the size of the 
Delaunay triangulation is at most quadratic in the number of points. The persistence 
diagram are the same since for any parameter r, the a-complex and Cech complex 
are homotopy equivalent (see [22]), giving rise to isomorphic homology groups. 

The results are shown in Figure]^ The number of points was varied from 100 to 
1,000,000 (in higher dimensions, this was reduced due to computational complexity). 
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Maximum Hi in 2D 


Maximum Hi in 3D 
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log n 
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log n 
log log n 
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Maximum H^ in 3D Maximum H 2 in 4D 



(C) 
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Figure 5. Plots of maximum persistence for the Cech filtration, 
against the proper scaling term Afc(n). We tested different dimen¬ 
sions for the homology and for the ambient space. (A) Hi in (B) 
Hi in (C) H 2 in M^(D) H 2 in Each point is the result of a 
different trial, and the red line represents the best linear £t. For (A), 
(B), and (C) the range of points is n = 10^ to 10®. For (D), the range 
is roughly n = 10^ to 10"^. The reduced range is a consequence of 
computational considerations - the number of simplices grows quickly 
as the dimension increases. 
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We tested the behavior of nfc(n) for a few values of A;, and d (the ambient dimension). 
For d = 2, the only interesting case is k = 1, namely Hi (A). The resulting plot shows 
the maximal persistence ni(n) against Ai(n) = log n/log log n. For each value of n, 
we repeated the experiment several times. Here, we also plot the best linear £t with 
the constant 0.88. We also show the results for Hi when d = 3 (B), H 2 when d = 3 
(C), and H 2 when d = 4 (D). We note that we performed a the same tests for the 
Rips hltration and the results were the same (but with different slopes). 

There are two particularities in these plots - the hrst is that the spread is large 
for any one value of n. While it follows the straight line well it does not seem to 
converge to a single value. However, the resulting distributions do seem to converge, 
albeit slowly, as can be seen in Figure]^. The histograms (A), (B), and (C) present 
the resulting ratio for 400, 2000, and 2,000,000 points, respectively. While there is a 
deviation, the distribution does become more concentrated around its peak. 


0.3 

0.2 

0.1 

0 

0.5 0.6 0.7 0.8 0.9 1 1.1 0.5 0.6 0.7 0.8 0.9 1 1.1 0.5 0.6 0.7 0.8 0.9 1 1.1 

(A) (b) (c) 

Figure 6. Histograms of empirical ni(n) in 2D normalized by 
for (A) 400 points (B) 2000 points (C) 2 x 10® points. 

The second issue is is that at smaller n, the maximum value drops off faster than 
linearly. This can be seen particularly in of Figure]^ (B). This phenomenon could 
be explained by saying that n is simply not large enough for the limiting behavior 
to apply. Nevertheless, we tried to investigate this issue further, by considering the 
Cecil complex on the flat torus (T^) by identifying the edges of the unit square. 
This part was computed using the periodic triangulations provided in CGAL [30] . 
We generated points in the unit square and then computed the maximal persistence 
using the Euclidean metric (e.g. the standard case) and using the metric on the flat 
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torus. This was repeated 100 times for each value of n. We computed the mean 
and standard deviation for each value and show the results in Figure The red line 
shows the mean for the unit square. The red shaded region showing the interval of the 
mean +/- the standard deviation. The blue line (and the blue region) are the mean 
(and standard deviation) for the maximal persistence on the flat torus. The purple 
region is region where the blue and red regions overlap. As can be seen, for most n 
the maximal persistence is identical, indicated that the longest lived cycles did not 
occur near the boundary. The difference is only visible for small values of n (where 
there are only a few points). At low values of n, the results on the torus demonstrate 
a more linear behavior. This provides strong evidence that the non-linearity is due 
to boundary effects. 

For the case of the flat torus, there are two essential one dimensional homology 
classes (cycles with infinite persistence) corresponding to the generators of the torus. 
For the above results, we ignore the essential classes. 

8. Conclusion 

In this paper we examined the maximum persistence of cycles in the persistent 
homology of either the random Cech or Rips complexes, generated by a homogeneous 
Poisson process in the unit cube. We showed that with a high probability we have 

nA:(ii') ~ ( log log n ) ' This paper proves that upper and lower bounds exist, and 
it remains future work to prove stronger limiting theorems such as a law of large 
numbers or a central limit theorem for nfc(?7,). 

We note that while we focused on the Poisson process on the cube for simplicity, 
similar results can be proved with minor adjustments for non-homogeneous Poisson 
processes as well, and for many compact spaces other than the cube (for example, 
compact Riemannian manifolds). The scale of the maximum persistence should be 
the same (Afc(n)), but the exact constants will be different. An important observation 
in this case is that nfc(n) should be defined as the maximum persistence among 
all the “small” cycles, i.e. ignoring the cycles that belong to the homology of the 
underlying space. Recall, that these small cycles are considered the noise in various 
TDA applications. Thus, revealing their distribution would be an important first 
step in performing noise filtering or reduction. At this point we would like to offer 
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Figure 7. The effect of boundaries is larger for a small number of 
points. The plot shows the mean maximum persistence for Hi as 
a function of logn/log logn with the shaded region showing interval 
corresponding to +/— the standard deviation. The red line (and the 
red shaded region) shows the maximum persistence in the unit square, 
while the blue line shows the maximum persistence for the same point 
set in the flat torus. The purple region shows that for most values of 
n, the value of maximal persistence is the same in both cases. This is 
illustrated by an equal mean as well as the overlapping shaded regions 
(shown as purple). In (A), we see the plot up to several thousand 
points, while in (B) we show a close-up for small values of n, where 
the results differ. 


the following insight related to the “signal to noise ratio” (SNR), in this kind of 
topological inference problems. 

Suppose that the samples are generated from a distribution on a compact manifold 
A4, and our interest is in recovering its homology The cycles that belong 

to the homology of M. will show up in the Cech complex at some radius, and we can 
denote by 11^(n) the minimal persistence of these cycles (in the Cech filtration). 
One question we might ask is - how do the signal and the noise compare? in other 
words - what can we say about n^(n)/nfc(u)? 

The analysis we have so far already offers a preliminary answer to this question. 
For every cycle 7 that belongs to the homology of Ai we know two things: (a) 
1 death IS approximately constant (depending on the geometry of A4), and (b) 'jbirth < 
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C (since there are no more changes in homology past coverage, see Theorem 

4.9 in HU]). Therefore, we can conclude that 


K(n) > C 


n 


log 77, 


l/d 


Combining this bound with our bound for 11^(77) we have for example, that for any 

e > 0, 

nm > 

nfc(77) 

To get a better estimate for this ratio, we will need to rehne our results for nfc(n), as 
well as to make more precise statements about the birth times of cycles that belong 
to M. (instead of using a crude upper bound). 

To conclude, we believe that the results in this paper are a promising lead in the 
direction of noise hltering for topological inference, and will be very useful for future 
analysis of probabilistic models in TDA. 
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